Acquiring entailment pairs across languages and domains: A Data Analysis
نویسندگان
چکیده
Entailment pairs are sentence pairs of a premise and a hypothesis, where the premise textually entails the hypothesis. Such sentence pairs are important for the development of Textual Entailment systems. In this paper, we take a closer look at a prominent strategy for their automatic acquisition from newspaper corpora, pairing first sentences of articles with their titles. We propose a simple logistic regression model that incorporates and extends this heuristic and investigate its robustness across three languages and three domains. We manage to identify two predictors which predict entailment pairs with a fairly high accuracy across all languages. However, we find that robustness across domains within a language is more difficult to achieve.
منابع مشابه
Large-Scale Acquisition of Entailment Pattern Pairs by Exploiting Transitivity
We propose a novel method for acquiring entailment pairs of binary patterns on a large-scale. This method exploits the transitivity of entailment and a self-training scheme to improve the performance of an already strong supervised classifier for entailment, and unlike previous methods that exploit transitivity, it works on a largescale. With it we acquired 138.1 million pattern pairs with 70% ...
متن کاملAcquiring Data for Textual Entailment Recognition
Language resources are hardly ever large enough. Building language resources that can be used as a gold standard for semantic analysis requires effort and investment. We present a prototype for acquiring language resources by means of a language game which is a cheap but long-term method. Games employed to acquire language resources are not new. For example games with a purpose are used for col...
متن کاملA Search Task Dataset for German Textual Entailment
We present the first freely available large German dataset for Textual Entailment (TE). Our dataset builds on posts from German online forums concerned with computer problems and models the task of identifying relevant posts for user queries (i.e., descriptions of their computer problems) through TE. We use a sequence of crowdsourcing tasks to create realistic problem descriptions through summa...
متن کاملMonolingual Social Media Datasets for Detecting Contradiction and Entailment
Entailment recognition approaches are useful for application domains such as information extraction, question answering or summarisation, for which evidence from multiple sentences needs to be combined. We report on a new 3-way judgement Recognizing Textual Entailment (RTE) resource that originates in the Social Media domain, and explain our semi-automatic creation method for the special purpos...
متن کاملSemeval-2013 Task 8: Cross-lingual Textual Entailment for Content Synchronization
This paper presents the second round of the task on Cross-lingual Textual Entailment for Content Synchronization, organized within SemEval-2013. The task was designed to promote research on semantic inference over texts written in different languages, targeting at the same time a real application scenario. Participants were presented with datasets for different language pairs, where multi-direc...
متن کامل